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Abstract 

In view of the critical behavior exhibited by a statistical model 
describing a one dimensional crystal containing isotopic impurities 
and the fact that for impurity mass parameter 3 times the host mass 
the critical temperature can attain very high values, we suggest that 
the mosaic structure displayed by certain nucleic acids (DNA) of eu- 
karyotic organisms might have had its origins in a phenomenon of 
condensation of codons ocurred at prebiotic conditions. An appropri- 
ated map of that model onto some of the DNA features allows one to 
predict power law behavior for both correlation functions and exons 
size distribution in binary sequences of nucleotides which are distin- 
guished by their protein coding or noncoding functions. Preliminary 
studies of these quantities performed on intron-containing sequences 
from GenBank are presented here. 



1 Introduction 



Nucleotide sequences on DNA of high organisms display a mosaic structure 
in such a way that expressed sequences (exons) are interrupted by intervening 
sequences (introns) which do not code for aminoacids [|TJ. Up to the present, 
the role of introns are not unraveled although there are beliefs that their 
presence on the eukaryotic genome promote biological stability since muta- 
tions on noncoding regions are not expected to affect information contents 
regarding protein synthesis M . 

Uncertainties on whether introns have been either introduced into or with- 
drawn from embryonic sequences of nucleotides have raised doubts on the 
common assumption that procaryotic genome is primitive relatively to the 
eukaryotic genome. Up till now, the dispute between introns-early |§ and 
introns-late [Q hypotheses appears to be unresolved. 

It is worth to notice in this respect that there are evidences favoring the 
presence of exons of all sizes on eukaryotic sequences and the existence of a 
close relationship between exons distribution along such sequences and pro- 
tein structure ||. The question of whether such a distribution has a prebiotic 
origin is then closely related to the introns-early/introns-late dilemma ||. 

In the last few years there have been numerous efforts to rationalize the 
genetic information stored in DNA nucleotide sequences from a statistical 
viewpoint [0. Apart from differences of methodology in using GenBank data 
base, these works present numerical information on the two-point autocorre- 
lation function C (r) in one-dimensional space (or its Fourier transform power 
spectrum), for nucleotide separation distance r (in appropriate units) on a 
same DNA strand. The essential question addressed in these works relates 
to disclose long-range correlations (or, equivalently, power-law behavior) in 
these sequences by distinguishing nucleotides according to their pyrimidine 
(Cytosine, Thymine) or purine (Adenine , Guanine) base content . Criteria 
based on the number of hydrogen-bonds linking complementary nucleotides 
in pairs have also been used. 

In this context, the relevance of the coding and noncoding parts of eu- 
karyotic sequences for interpreting the genetic information of DNA has been 
upraised since S. Nee's' suggestion that the ultimate source of long-range 
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correlations reported in early studies 0, could be the underlined mosaic 
structure of genome || . This observation has led to further studies on purine 
and pyrimidine organization along separate coding and noncoding sequences. 
Nevertheless, the question on whether the mosaic structure of the genome is 
responsible for signaling long range order in purine-pyrimidine distribution, 
remains under debate || . 

From a physical viewpoint one of the interests of studying statistical prop- 
erties of the genome resides on the fact that, by finding long- range behavior 
for C(r), one might address the origin of nucleotide base sequences organi- 
zation to a critical phenomenon that eventually occurred during biological 
evolution. In this respect, one can find in the literature nonequilibrium sta- 
tistical mechanics models proposed to describe the dynamics of nucleotide 
sequences under the action of some natural evolutionary driving processes 

0- 

We choose to address questions related to the genome organization by 
distinguishing nucleotides not between purines and pyrimidines but instead, 
regardless their base content, by distinguishing nucleotides according their 
loci on an exon or intron region of a DNA sequence . We believe that direct 
information on statistical properties of mosaic structure shall be helpful for 
tracing the origins and functions of introns, which is our main concern here. 

The focus of our studies on nucleotide sequences was entirely suggested 
by the results obtained for equilibrium properties of a statistical mechan- 
ics model proposed recently to investigate conditions for isotopic order (or 
"isotopic fractionation" ) on a harmonic crystal chain containing isotopic im- 



purities JTT|. On the basis of an appropriated map (see below), we argue that 
this model is also suitable for describing some features of DNA concerning, 
in particular, its mosaic structure. 

The relevant aspect of the original model equilibrium thermodynamics 
we want to emphasize here is that at temperatures below a certain critical 
temperature Tc, it displays a condensed phase in which impurities aggregate 
into clusters of all sizes. Within this phase, the cluster distribution function 
and two-point auto-correlation functions are expected to exhibit temperature 
dependent power-law decay [JP2] . Moreover, it was also found in Ref. that 



for masses m a of host particles and m& of impurities satisfying the particular 
relationship 

m b ~ 3m a (1) 
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the critical temperature for condensation can attain very high values ]T3 . 

Making a parallel of this model with a similar description for DNA molecule, 
allows us to conjecture that the origin of DNA mosaic structure might have 
been due to a codon |14| condensation into clusters (or exons) that eventu- 
ally occurred during prebiotic synthesis of some nucleic acids. Within this 
picture, we suggest that thermodynamical stability of DNA with respect to 
small spatial fluctuations of its molecular components had a crucial role for 
establishment of an ordered state (mosaic structure) in which codons appear 
segregated from protein noncoding regions. Important to stress is that the 
present proposal do not relate to the emergence of particular sequences of 
purine and pyrimidine bases; it is restricted to the mosaic features, regardless 
base content. 



In Section 2 we review the model of Ref. UTTJ and extend the results in 
order to estimate the asymptotic behavior of related pair correlations func- 
tions and clusters size distribution. On the basis of a map of this model onto 
DNA nucleotide sequences suggested in Section 3, we show some numerical 
results for these quantities obtained in preliminary studies using data from 
GenBank. A discussion is in Section 4. 



2 A model Hamiltonian 



Here we review the original model of Ref. fill anc ^ explore some of its sta- 
tistical properties to describe aspects of systems of interest. 



2.1 Isotopic order : phonon induced interactions 

As mentioned, the original lattice model Hamiltonian describes a one dimen- 
sional crystal containing isotopic impurities interacting by harmonic potential 

H = Y.{-T Ui+^-Ui(ui-Ui-x)} (2) 
%=i z z 

where Ui and mi are respectively, the displacement from equilibrium position 
and mass of the particle at lattice site i. K is a common force constant. There 
are present in this system two isotopic species, A (host) and B (impurity), 
with masses m a and mj, respectively. By introducing site-dependent spinlike 
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variables o\ which assume the value (for host species) and 1 (for impurity), 
it is possible to write m 8 as: 



m a + (m b - m a )<Ji 



(3) 



Now, it is assumed that this system can be driven into a region of suf- 
ficiently high temperatures where the particles would freely interchange po- 
sitions with each other. The nature of such processes is immaterial here, 
conditioned that their characteristic times are very small if compared to typ- 
ical lattice relaxation times. The question posed in Ref. |1TJ is whether in 
this situation the phonons of the lattice could induce positional correlations 
among impurities leading to some sort of isotopic order at thermodynam- 
ical equilibrium. For that, we introduce phonon creation (b^j and phonon 
annihilation (5q) operators as follows: 



and 



U ; 



E 



E 



2Nm a uj„ 



2Nm a u q 



(b q - bl 



rewrite H in (|j) as 



<W 



where B n 



b q - bl q , 



a = - {m b /m a - 1) 



E 



. P ik i 



(4) 
(5) 

(6) 

(7) 
(8) 



and uj q 
for 



1 /2 

[2i^(l — cosg)/m a ] is the dispersion relation for free phonons 
-iV-2 s 



q G {-7T, 



-2tt 2tt 

)7T,...,— ,0,-,...7T}. 



To approach equilibrium properties of this system, we consider its grand 
partition function at fixed temperature : 
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Wj} \{ n g} I 



(9) 



where the sums extend over all spins configurations {cr,} and over phonon 
occupation number in the q mode {n q }. [l is a chemical potential that controls 
the density of impurities and H is as in (||). Integration over phonon variables 
allows one to write S as 

S(J3,fi) = ZoE e~^ F ^ e ^£^ (10) 
fo} 

where 



, Efn }(K} e-^ k>; 



E {M <Wie-^°IK}> 
ee -/T 1 In Z/Z Q 

is an effective interaction among impurities induced then by the lattice phonon 
contents. Zq is a normalization constant defined to set AF(0) = 0. 

We've found that, for a given spin configuration and low impurity density, 
AF is separated into a sum 

AF = E A.Fj(r) (12) 
i 

such that Ai 7 ] is the effective energy of an isolated cluster of impurities 



(indexed by r) containing / components; namely, a /-cluster |L5 



This enables us to focus on an isolated /-cluster and extract from AF; a 
cluster "surface energy" term E\ whose asymptotic behavior is given by 

as / goes to infinity, with 

A = mb + ma (14) 
m b - m a 

and a cluster bulk energy /$, in agreement to the model studied in Ref. [PH . 
Accordingly, 

AF~/$ + F, (15) 

In the following we make further considerations to discuss the nature of 
the phase transition exhibited by this model on the basis of equations (|l~2]) 
and (0). 
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2.2 Clusters Size Distribution 



It is our purpose here to characterize the phase transition displayed by the 
model above in the limit of very low impurity density. For this, we replace 
the sum over spin variables in fllpp by a sum over all configurations of isolated 
impurity clusters of all sizes and positions. Within this approximation we 
treat the system as an ideal lattice gas mixture of different molecular species. 

Let ai be the number of /-clusters in a given configuration of the system. 
Then (O) becomes 

oo 

AF -£>(/$ + £,) (16) 
1=1 

and (|Tt]) can be approximated by 

where L is the chain length and zi = is the /-cluster activity with \xi being 
the corresponding chemical potential. At thermodynamical equilibrium, 

m = in (18) 

Defining a /-cluster internal partition function qi as 

<Z, = exp[-/3(Z$ + £,)] (19) 

expression ([T7|) reads 

oo 

2(/3, /i) ~ Z n ^ (20) 
i=i 

from which the average number of 1-clusters per unit length pi(/3,fi) can 
readily be obtained: 



l>i { - ) >-l» ^_T z i\^r\ =r^.exp[-/5/($- Ai )] (21) 

L,T 



L^oo L \ dzi 



In expression fl2T|), we have used relation (|1^) and defined a critical inverse 
temperature j3 c by 
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Pc 



{K/m a ) 2 
tt(2-A) 2 



(22) 



On summing fl2l) over all values of I, we obtain the average number of 
clusters of any kind per unit length; namely, 



(23) 



i=i 



i=i 



so that the fraction of /-clusters at thermodynamical equilibrium, or equiva- 
lently, the probability of finding a /-cluster in the system, is given by 



El 

V 



(24) 
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Let us now review the properties of the sum in 

(a) If e~^* _M ) < 1, then v converges for (3 > 0; 

(b) If e~^~^ = 1, then v converges for (3 > (3 C or T < T c = l/ksPc- 
The last condition expresses the critical behavior of the model which 

becomes apparent by studying the limit 



limP* (/?,//) 
a4* 



/ Pc 



lim 



(25) 



If T > T c then lim M |$ i/(/3, /i) — > oo so that for temperatures higher than 
T c the probability of finding stable clusters of relative large sizes is negli- 
gible. However, this limit attains finite values in regions where T < T c . 
Consequently, at sufficiently low temperatures the statistical properties of 
the system are governed by the Levy distribution 



Pi(J3 > p e , $) ~ / Pc =p 



(26) 



with 



V = V (J3) = -p/p c (27) 

which characterizes its condensed phase. Within this phase macroscopic 
regions of impurities can be found at all scales. 

We should notice that result (|26|) which was obtained here for noniteract- 
ing clusters, is expected to hold for this class of models even in more general 
cases as those studied in Ref. [112 
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2.3 Pair Correlation Functions 

In addition to the size distribution we can also make quantitative predictions 
on the behavior of impurity pair correlation functions which in the present 
context are defined by 

C(\r\) = (*i(T i+r ) - (a,) (a i+r ) (28) 

for |r] measuring distances between two impurity particles along the con- 
sidered chain. The set of variables {<jj} are assigned according to (|3|). (...) 
indicates averages over statistical ensemble of the system. 

Since the only contribution to the truncated function (|28"D comes from 
configurations in which the considered two points are covered by a single 
cluster we can use the approach introduced in the last Section and estimate 
the behavior of C(|r|) at the condensed phase by the probability of finding 
any cluster of size greater or equal than |r| . Using result fl2"S|), we then disclose 
asymptotic power-law behavior 

oo 

c(M)~ £PK/?>&,$)~M £ (29) 

l=\r\ 

with 

e = e(P) = 1 - p/p e (30) 
as expected jEJ], |T7J. We remark that 1 — (5 / (3 C < within this phase. 



3 Modeling Eukaryotic Sequences 
3.1 Map onto DNA 

It is our intention in this Section to suggest a map of the model reviewed 
above in order to describe aspects of DNA eukaryotic sequences. For this 



purpose, it is relevant to notice in Eq. (14) that for A ~ 2, i.e., for 



m b ~ 3m a (31) 

the critical temperature of condensation T c can attain arbitrarily large values 
[see Eq. fl22Dl. This result leads us to make some conjectures concerning 
coding and noncoding organization along some nucleotide sequences: 
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(i) Let us assume that small positional displacements of nucleotides with 
respect to their equilibrium positions along DNA a-helix backbone, which in 
turn generate phonons, are relevant degrees of freedom of the macromolecule 
and can be accounted by the lattice model Hamiltonian in Eq.@. For this, 
we assume also that the two DNA complementary strands can be repre- 
sented by a single chain of harmonic oscillators placed along that direction. 
Accordingly, we shall consider that any two nucleotides linked by H-bonds 
in complementary pairs are, in certain conditions, constrained to move along 
chain direction as a single particle of mass (Fig la): 

m + rriA + ttlt = rriAT (32) 

or 

m + m c + m G = m CG (33) 

where rriA(T,c,G) is the mass of an Adenine (Thymine, Cytosine or Guanine) 
base and m is the mass of remaining sugar and phosphate components of 
nucleotides. 

FIGURES 1(a) ,l(b) 

Due to the fact that either complementary pair AT or CG comprise a two- 
carbon ring purine base and a one-carbon pyrimidine joint to attain a three- 
ring base pair (see Fig. lb), we can ascribe within reasonable approximation 
the mass of a host particle m a defined in the model for isotopes, to the mass 
of any combination A-T or C-G, i.e. 

m a ~ m AT ~ m C G (34) 

(ii) In addition, we consider the possibility that eventually in our DNA 
lattice model there are present some triplets of consecutive complementary 
pairs of nucleotides which are also constrained (by action of mechanisms 
external to the system) to move as single particles; each of these comprises 
then six nucleotides. The number of such assembled triplets is controlled by 
a chemical potential \i. 

In view of ( |34"1) we can then ascribe a mass of "impurities" particles 
to any of these triplets in such a way that relation ( pT| ) holds, i.e. 

m b ~ Ciin AT + c 2 m CG (35) 
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for ci,C2 integers assuming any value of the interval [0,3] but subjected to 
the condition c\ + C2 = 3. 

(hi) Finally, the assembled triplets are to be identified here with codons 
(accompanied by their complementary nucleotides), which are the units of 
genome contents information. 

In analogy to the model of last Section, we assign site dependent spinlike 
variables {a} to the lattice ordered points such that 

o~j = 1, if i is a codon 

or (36) 
o"j = 0, if % is not a codon 

Notice that by this procedure, each codon occupies a single lattice site. Notice 
also that, according to our purposes, the defined set {a} for chosen nucleotide 
sequences disregard differences concerning the nature of purine-pyrimidine 
base pairs. 

Steps (i), (ii) and (iii) accomplish the map. Let us examine some of its 
consequences. 



3.2 Distribution of Coding Sequences: Mosaic Struc- 
ture 

We are now able to make some quantitative predictions on the structure of 
eukaryotic DNA sequences with respect to the distribution of coding regions. 

According to the results of Sec. 2.2, and the above map, we expect that 
the size distribution of coding regions along eukaryotic gene sequences, which 
we refer here as DNA mosaic structure [ I8[ , follows the power-law fl26|) with 



P/Pc > 1, in analogy to the distribution of clusters of isotopic impurities of 
the physical model at condensed phase. 

In order to get support for this conjecture we select sequences from Gen- 
Bank and check for this property. We focus on Saccharomyces chromosomes 
(SCCHRIII, SCCHRIX, YSCCHRVIN) which are among the longest intron- 
containing sequences available at that data bank. We notice that such se- 
quences were selected from available data upon requirement of presenting 
complete coding as well as noncoding regions (CDS sequences). 

Figure 2 shows histograms where there are depicted the diverse sizes of 
the respective coding regions. To extract representative functions for these 
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diagrams in each case we test both polynomial and exponential fittings which 
are also shown. 



FIGURE 2 



3.3 Codon Pair Correlation Functions 

In the same context we examine the behavior of codon pair correlation func- 
tions. Figure 3 shows the variation of C(r) with codon pair distance r for 
various intron-containing nucleotide sequences taken from GenBank. These 
curves have been obtained upon assignments fl36|) , and assuming ergodicity 
of the systems of interest so that for each sequence, we replace the ensemble 
averages indicated in (p8|) by "temporal" averages 

c(|r|) = \ E ^ - (i E E ( 37 ) 

where the sequence length L is measured in nucleotide units. 

Fig. 3 shows our results for C{r) obtained by evaluation of expression ([37]) 
on the same DNA sequences studied last Section and considering periodic 
boundary conditions. 

FIGURE 3 
Comments on these results are in Sec. 4. 



4 Discussion 

The above ideas relate to the origin and function of genome mosaic organi- 
zation. They are based on the results obtained previously for equilibrium 
properties of a one-dimensional harmonic lattice chain model of two isotopic 
components (masses m a and m&) which was proposed recently for studying 
isotopic fractionation. In favorable thermodynamic conditions, this model 
exhibits a condensed phase in which segregation takes place, giving rise to 



macroscopic regions of just one component |[L1|| . In order to use these results 
to pursue a description of DNA mosaic structure, we focus on the multiscale 
characteristic predicted for the distribution of such aggregates and on the fact 
that the critical temperature for phase transition can attain very high values 
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for a particular set of values for model parameters (i.e. for m& ~ 3m ). In this 
case, the condensed phase should be stable in a broad range of temperatures 
including, in particular, our present room temperatures. 

On the basis of the map of Section 3, we are then lead to suggest: (i) that 
the mosaic structure might have had its origins in a condensation phenom- 
ena of codons that took place at prebiotic conditions, and (ii) this mosaic 
structure, being thermodynamically stable with respect to the distribution 
of coding and noncoding sequences, might in fact be related to the biolog- 
ical stability of the genome of high organisms since it is believed that the 
presence of noncoding sequences ensures relative low probabilities of harmful 
mutations on the genome 0. 

In order to find support to the model and to check for the plausibility 
of related assumptions, we explore the fact that both exon size distribution 
functions and codon correlation functions are predicted to display power-law 
decay according to expressions ( j26| ) and (^) respectively. It is remarkable 
in this respect that the decay power in both cases are predicted to be tem- 
perature dependent. This fact shall be of some relevance from a biological 
point of view, for it provides information on the thermodynamics of gene 
formation. 

We select sequences from GenBank for which calculation of these quanti- 
ties were performed. The results obtained are shown in Figures 2 and 3. In 
these calculations we distinguish nucleotides according to their loci on a cod- 
ing or noncoding region of each sequence, in agreement to our assumptions. 

Let us focus on the results obtained for pair correlations of selected chro- 
mosome sequences, and restrict the analysis to the range of two-point dis- 
tances depicted in Figure 3. We notice that correlation curves for all chosen 
sequences are smooth only in a relative small range, exhibiting in the remain- 
ing an oscillatory behavior due, probably, to the absence of experimental in- 
formation on coding and non-coding regions in all representative scales [H] 



Even so, we observe that the decay profile of the curves in Figure 3 can in 
principle be fitted by either exponential or polynomial functions. Although 
the characteristic decay parameter in each of the exponentials shown is very 
small (i.e., not greater than O (10~ 3 ), see Table 1) we should not, on the 
basis of these results alone, discard the possibility of exponential in favor of 
polynomial fittings. In face of this, the situation then appears to be incon- 
clusive concerning long-range order of coding regions in eukaryotic sequences 
of nucleotides. 
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Sequence 


m cxp 


^GXp 


e exp 


^cxp 


g-exp ^oxp 


SCCHRIII 


-1.4 x 1(T 3 


-2.i x iir 3 


-.79 


-1.83 


1.04 


SCCHRIX 


-7.7 x 10" 4 


-6.2 x 1(T 4 


-.99 


-2.15 


1.16 


YSCCHRVIN 


-8.3 x 10~ 4 


-2.6 x ID" 3 


-.91 


-1.68 


.77 



Table 1 - Results of polynomial fitting exponents for exons size distribu- 
tion, r/ ex P, and codon correlation functions, e exp (Fig. 2 and 3 respectively), 
along the saccharomyces chromosome nucleotide sequences indicated. Ex- 
perimental information on coding and non-coding regions along each of these 
sequences were taken from GenBank data bank. m exp and n exp are the corre- 
sponding results for exponential fitting coefficients. Last column shows the 
difference 7/ exp — e cxv obtained in each case. According to our theoretical 
predictions, this difference is expected to be of order one. 

Let us now focus on the numerical results obtained for exons size distribu- 
tion. The fact that there are present in the histograms many gaps in different 
positions corroborates our conjectures above on the oscillatory behavior of 
correlation functions. Figure 2 presents both exponential and polynomial 
fittings for the referred histograms. We observe that also in this case either 
fittings seem to be adequate although, as for the case of correlations, the 
decay parameter of each of the fitted exponentials is very small. 

Additional evidences in favor of polynomial fittings can nevertheless be 
obtained in case of distribution functions. In fact, as can be observed in 
Figure 2, rare events, namely, relative large coding regions present in each 
sequence can only be accounted by polynomial fitting curves with f3/j3 c > 1, 
in agreement to our predictions. 

Moreover, if we compare the decay profiles of related correlation and 
distribution polynomial fittings we find that the differences between the cor- 
responding decay coefficients for all of the sequences studied is of order one, 
as predicted by our model [compare expressions fl26|) and (|29|) with results 
in the last column of Table 1). 

In conclusion, although our numerical studies lack more information on 
other scales of coding and non-coding regions of genome, it is still possible 
to obtain strong support for the use of the model of Section 2 to describe 
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aspects of DNA mosaic structure, on the basis of a joint analysis of avail- 
able experimental data on both exons size distribution and codon correlation 
functions. 

Finally, we comment on the hypothesis of Section 3. In our opinion, it 
is important for the model credibility to find reasons at least for some of 
the stated assumptions, which we review here: a) in our DNA lattice model, 
some triplets of nucleotides complementary pairs are constrained to behave 
as single particles concerning their interactions with the remaining system, 
in contrast to other pairs which are not constrained in this way; b) in special 
conditions (prebiotic), all particles can interchange positions with each other. 

One possibility to think on these is to imagine triplets being assembled 
at the required prebiotic conditions, by the action of external agents that are 
able to promote aggregation of any three consecutive nucleotide complemen- 
tary pairs at random positions on the lattice. Details of physical mechanisms 
involved in this kind of process are immaterial here, since it is also assumed 
that it happens in a in a very short time scale if compared to the lattice re- 
laxation time. Within this picture, the coding regions of DNA would emerge 
due to a possible sequential action of these external agents. According to the 
results presented here, stabilization of macroscopic sequences of such triplets 
though, would be favorable only at regions of temperatures below T c . 
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6 Figure Caption 

Figure la: Cytosine-Guanine (CG) and Adenine-Thymine (AT) nucleotide 
complementary pairs. Notice that either CG or AT comprises three 
aromatic rings. 

Figure lb: Single chain mass-string representation of a DNA nucleotide 
sequence. 
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Figure 2: Exon size histograms for saccharomyces chromosome nucleotide 
sequences taken from GenBank : (a) SCCHRIII [315,341 base pairs 
(bp)]; (b) SCCHRIX (439,885 bp) and (c) YSCCHRVIN (270,148 bp). 
It is also shown polynomial P*(l) ~ l v ™ P (solid line) and exponential 
P*(l) ~ exp(m exp /) (dashed line) fitting curves as functions of exon size 
I. Results for r) exp and m cxp obtained from these fittings in each case 
are in Tablel. 

Figure 3: Corresponding pair correlation functions C*(r) for the sequences 
of Figure 2. Polynomial C*(r) ~ r £Cxp and exponential C*(r) ~ exp(n exp r) 
fitting curves are also depicted and results for e cxp and n cxp are in Ta- 
blel. 
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figure 2 (c) 
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